Search CORE

23 research outputs found

Dwarfs on Accelerators: Enhancing OpenCL Benchmarking for Heterogeneous Computing Architectures

Author: Johnston Beau
Milthorpe Josh
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2018
Field of study

For reasons of both performance and energy efficiency, high-performance computing (HPC) hardware is becoming increasingly heterogeneous. The OpenCL framework supports portable programming across a wide range of computing devices and is gaining influence in programming next-generation accelerators. To characterize the performance of these devices across a range of applications requires a diverse, portable and configurable benchmark suite, and OpenCL is an attractive programming model for this purpose. We present an extended and enhanced version of the OpenDwarfs OpenCL benchmark suite, with a strong focus placed on the robustness of applications, curation of additional benchmarks with an increased emphasis on correctness of results and choice of problem size. Preliminary results and analysis are reported for eight benchmark codes on a diverse set of architectures -- three Intel CPUs, five Nvidia GPUs, six AMD GPUs and a Xeon Phi.Comment: 10 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Associated Legendre Polynomials and Spherical Harmonics Computation for Chemistry Applications

Author: Limpanuparb Taweetham
Milthorpe Josh
Publication venue
Publication date: 07/10/2014
Field of study

Associated Legendre polynomials and spherical harmonics are central to calculations in many fields of science and mathematics - not only chemistry but computer graphics, magnetic, seismology and geodesy. There are a number of algorithms for these functions published since 1960 but none of them satisfy our requirements. In this paper, we present a comprehensive review of algorithms in the literature and, based on them, propose an efficient and accurate code for quantum chemistry. Our requirements are to efficiently calculate these functions for all non-negative integer degrees and orders up to a given number (<=1000) and the absolute or the relative error of each calculated value should not exceed 10E-10. We achieve this by normalizing the polynomials, employing efficient and stable recurrence relations, and precomputing coefficients. The algorithm presented here is straightforward and may be used in other areas of science.Comment: The 40th Congress on Science and Technology of Thailand (STT40

arXiv.org e-Print Archive

CiteSeerX

Efficient update of ghost regions using active messages

Author: Milthorpe Josh
Rendell Alistair
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/02/2016
Field of study

The use of ghost regions is a common feature of many distributed grid applications. A ghost region holds local read-only copies of remotely-held boundary data which are exchanged and cached many times over the course of a computation. X10 is a modern par

The Australian National University

PGAS-FMM: Implementing a distributed fast multipole method using the X10 programming language

Author: Huber Thomas
Milthorpe Josh
Rendell Alistair
Publication venue: 'Wiley'
Publication date: 10/12/2015
Field of study

The fast multipole method (FMM) is a complex, multi-stage algorithm over a distributed tree data structure, with multiple levels of parallelism and inherent data locality. X10 is a modern partitioned global address space language with support for asynchr

The Australian National University

AIWC: OpenCL-Based architecture-independent workload characterization

Author: Johnston Beau
Milthorpe Josh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/08/2019
Field of study

Measuring performance-critical characteristics of application workloads is important both for developers, who must understand and optimize the performance of codes, as well as designers and integrators of HPC systems, who must ensure that compute architectures are suitable for the intended workloads. However, if these workload characteristics are tied to architectural features that are specific to a particular system, they may not generalize well to alternative or future systems. An architecture-independent method ensures an accurate characterization of inherent program behaviour, without bias due to architecture-dependent features that vary widely between different types of accelerators. This work presents the first architecture-independent workload characterization framework for heterogeneous compute platforms, proposing a set of metrics determining the suitability and performance of an application on any parallel HPC architecture. The tool, AIWC, is a plugin for the open-source Oclgrind simulator. It supports parallel workloads and is capable of characterizing OpenCL codes currently in use in the supercomputing setting. AIWC simulates an OpenCL device by directly interpreting LLVM instructions, and the resulting metrics may be used for performance prediction and developer feedback to guide device-specific optimizations. An evaluation of the metrics collected over a subset of the Extended OpenDwarfs Benchmark Suite is also presented

The Australian National University

Learning to live with errors: a fresh look at floating-point computation

Author: Milthorpe Josh
Rendell Alistair
Publication venue: Conference Organising Committee
Publication date
Field of study

The Australian National University